# importing the necessary Python libraries and the dataset:
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
data = pd.read_csv("transformed_data.csv")
data2 = pd.read_csv("raw_data.csv")
data
| CODE | COUNTRY | DATE | HDI | TC | TD | STI | POP | GDPCAP | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | AFG | Afghanistan | 2019-12-31 | 0.498 | 0.000000 | 0.000000 | 0.000000 | 17.477233 | 7.497754 |
| 1 | AFG | Afghanistan | 2020-01-01 | 0.498 | 0.000000 | 0.000000 | 0.000000 | 17.477233 | 7.497754 |
| 2 | AFG | Afghanistan | 2020-01-02 | 0.498 | 0.000000 | 0.000000 | 0.000000 | 17.477233 | 7.497754 |
| 3 | AFG | Afghanistan | 2020-01-03 | 0.498 | 0.000000 | 0.000000 | 0.000000 | 17.477233 | 7.497754 |
| 4 | AFG | Afghanistan | 2020-01-04 | 0.498 | 0.000000 | 0.000000 | 0.000000 | 17.477233 | 7.497754 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 50413 | ZWE | Zimbabwe | 2020-10-15 | 0.535 | 8.994048 | 5.442418 | 4.341855 | 16.514381 | 7.549491 |
| 50414 | ZWE | Zimbabwe | 2020-10-16 | 0.535 | 8.996528 | 5.442418 | 4.341855 | 16.514381 | 7.549491 |
| 50415 | ZWE | Zimbabwe | 2020-10-17 | 0.535 | 8.999496 | 5.442418 | 4.341855 | 16.514381 | 7.549491 |
| 50416 | ZWE | Zimbabwe | 2020-10-18 | 0.535 | 9.000853 | 5.442418 | 4.341855 | 16.514381 | 7.549491 |
| 50417 | ZWE | Zimbabwe | 2020-10-19 | 0.535 | 9.005405 | 5.442418 | 4.341855 | 16.514381 | 7.549491 |
50418 rows × 9 columns
The dataset that we are using here contains two data files. One file contains raw data, and the other file contains transformed one. But we have to use both datasets for this task, as both of them contain equally important information in different columns. So let’s have a look at both the datasets one by one:
data.head()
| CODE | COUNTRY | DATE | HDI | TC | TD | STI | POP | GDPCAP | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | AFG | Afghanistan | 2019-12-31 | 0.498 | 0.0 | 0.0 | 0.0 | 17.477233 | 7.497754 |
| 1 | AFG | Afghanistan | 2020-01-01 | 0.498 | 0.0 | 0.0 | 0.0 | 17.477233 | 7.497754 |
| 2 | AFG | Afghanistan | 2020-01-02 | 0.498 | 0.0 | 0.0 | 0.0 | 17.477233 | 7.497754 |
| 3 | AFG | Afghanistan | 2020-01-03 | 0.498 | 0.0 | 0.0 | 0.0 | 17.477233 | 7.497754 |
| 4 | AFG | Afghanistan | 2020-01-04 | 0.498 | 0.0 | 0.0 | 0.0 | 17.477233 | 7.497754 |
data2.head()
| iso_code | location | date | total_cases | total_deaths | stringency_index | population | gdp_per_capita | human_development_index | Unnamed: 9 | Unnamed: 10 | Unnamed: 11 | Unnamed: 12 | Unnamed: 13 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | AFG | Afghanistan | 2019-12-31 | 0.0 | 0.0 | 0.0 | 38928341 | 1803.987 | 0.498 | #NUM! | #NUM! | #NUM! | 17.477233 | 7.497754494 |
| 1 | AFG | Afghanistan | 2020-01-01 | 0.0 | 0.0 | 0.0 | 38928341 | 1803.987 | 0.498 | #NUM! | #NUM! | #NUM! | 17.477233 | 7.497754494 |
| 2 | AFG | Afghanistan | 2020-01-02 | 0.0 | 0.0 | 0.0 | 38928341 | 1803.987 | 0.498 | #NUM! | #NUM! | #NUM! | 17.477233 | 7.497754494 |
| 3 | AFG | Afghanistan | 2020-01-03 | 0.0 | 0.0 | 0.0 | 38928341 | 1803.987 | 0.498 | #NUM! | #NUM! | #NUM! | 17.477233 | 7.497754494 |
| 4 | AFG | Afghanistan | 2020-01-04 | 0.0 | 0.0 | 0.0 | 38928341 | 1803.987 | 0.498 | #NUM! | #NUM! | #NUM! | 17.477233 | 7.497754494 |
After having initial impressions of both datasets, I found that we have to combine both datasets by creating a new dataset. But before we create a new dataset, let’s have a look at how many samples of each country are present in the dataset:
data["COUNTRY"].value_counts()
Afghanistan 294
Indonesia 294
Macedonia 294
Luxembourg 294
Lithuania 294
...
Tajikistan 172
Comoros 171
Lesotho 158
Hong Kong 51
Solomon Islands 4
Name: COUNTRY, Length: 210, dtype: int64
So we don’t have an equal number of samples of each country in the dataset. Let’s have a look at the mode value:
data["COUNTRY"].value_counts().mode()
0 294 Name: COUNTRY, dtype: int64
So 294 is the mode value. We will need to use it for dividing the sum of all the samples related to the human development index, GDP per capita, and the population. Now let’s create a new dataset by combining the necessary columns from both the datasets:
# Aggregating the data
code = data["CODE"].unique().tolist()
country = data["COUNTRY"].unique().tolist()
hdi = []
tc = []
td = []
sti = []
population = data["POP"].unique().tolist()
gdp = []
for i in country:
hdi.append((data.loc[data["COUNTRY"] == i, "HDI"]).sum()/294)
tc.append((data2.loc[data2["location"] == i, "total_cases"]).sum())
td.append((data2.loc[data2["location"] == i, "total_deaths"]).sum())
sti.append((data.loc[data["COUNTRY"] == i, "STI"]).sum()/294)
population.append((data2.loc[data2["location"] == i, "population"]).sum()/294)
aggregated_data = pd.DataFrame(list(zip(code, country, hdi, tc, td, sti, population)),
columns = ["Country Code", "Country", "HDI",
"Total Cases", "Total Deaths",
"Stringency Index", "Population"])
aggregated_data.head()
| Country Code | Country | HDI | Total Cases | Total Deaths | Stringency Index | Population | |
|---|---|---|---|---|---|---|---|
| 0 | AFG | Afghanistan | 0.498000 | 5126433.0 | 165875.0 | 3.049673 | 17.477233 |
| 1 | ALB | Albania | 0.600765 | 1071951.0 | 31056.0 | 3.005624 | 14.872537 |
| 2 | DZA | Algeria | 0.754000 | 4893999.0 | 206429.0 | 3.195168 | 17.596309 |
| 3 | AND | Andorra | 0.659551 | 223576.0 | 9850.0 | 2.677654 | 11.254996 |
| 4 | AGO | Angola | 0.418952 | 304005.0 | 11820.0 | 2.965560 | 17.307957 |
I have not included the GDP per capita column yet. I didn’t find the correct figures for GDP per capita in the dataset. So it will be better to manually collect the data about the GDP per capita of the countries.
As we have so many countries in this data, it will not be easy to manually collect the data about the GDP per capita of all the countries. So let’s select a subsample from this dataset. To create a subsample from this dataset, I will be selecting the top 10 countries with the highest number of covid-19 cases. It will be a perfect sample to study the economic impacts of covid-19. So let’s sort the data according to the total cases of Covid-19:
# Sorting Data According to Total Cases
data = aggregated_data.sort_values(by=["Total Cases"], ascending=False)
data.head()
| Country Code | Country | HDI | Total Cases | Total Deaths | Stringency Index | Population | |
|---|---|---|---|---|---|---|---|
| 200 | USA | United States | 0.92400 | 746014098.0 | 26477574.0 | 3.350949 | 19.617637 |
| 27 | BRA | Brazil | 0.75900 | 425704517.0 | 14340567.0 | 3.136028 | 19.174732 |
| 90 | IND | India | 0.64000 | 407771615.0 | 7247327.0 | 3.610552 | 21.045353 |
| 157 | RUS | Russia | 0.81600 | 132888951.0 | 2131571.0 | 3.380088 | 18.798668 |
| 150 | PER | Peru | 0.59949 | 74882695.0 | 3020038.0 | 3.430126 | 17.311165 |
Now here’s how we can select the top 10 countries with the highest number of cases:
# Top 10 Countries with Highest Covid Cases
data = data.head(10)
data
| Country Code | Country | HDI | Total Cases | Total Deaths | Stringency Index | Population | |
|---|---|---|---|---|---|---|---|
| 200 | USA | United States | 0.924000 | 746014098.0 | 26477574.0 | 3.350949 | 19.617637 |
| 27 | BRA | Brazil | 0.759000 | 425704517.0 | 14340567.0 | 3.136028 | 19.174732 |
| 90 | IND | India | 0.640000 | 407771615.0 | 7247327.0 | 3.610552 | 21.045353 |
| 157 | RUS | Russia | 0.816000 | 132888951.0 | 2131571.0 | 3.380088 | 18.798668 |
| 150 | PER | Peru | 0.599490 | 74882695.0 | 3020038.0 | 3.430126 | 17.311165 |
| 125 | MEX | Mexico | 0.774000 | 74347548.0 | 7295850.0 | 3.019289 | 18.674802 |
| 178 | ESP | Spain | 0.887969 | 73717676.0 | 5510624.0 | 3.393922 | 17.660427 |
| 175 | ZAF | South Africa | 0.608653 | 63027659.0 | 1357682.0 | 3.364333 | 17.898266 |
| 42 | COL | Colombia | 0.581847 | 60543682.0 | 1936134.0 | 3.357923 | 17.745037 |
| 199 | GBR | United Kingdom | 0.922000 | 59475032.0 | 7249573.0 | 3.353883 | 18.033340 |
Now I will add two more columns (GDP per capita before Covid-19, GDP per capita during Covid-19) to this dataset:
data["GDP Before Covid"] = [65279.53, 8897.49, 2100.75,
11497.65, 7027.61, 9946.03,
29564.74, 6001.40, 6424.98, 42354.41]
data["GDP During Covid"] = [63543.58, 6796.84, 1900.71,
10126.72, 6126.87, 8346.70,
27057.16, 5090.72, 5332.77, 40284.64]
data
| Country Code | Country | HDI | Total Cases | Total Deaths | Stringency Index | Population | GDP Before Covid | GDP During Covid | |
|---|---|---|---|---|---|---|---|---|---|
| 200 | USA | United States | 0.924000 | 746014098.0 | 26477574.0 | 3.350949 | 19.617637 | 65279.53 | 63543.58 |
| 27 | BRA | Brazil | 0.759000 | 425704517.0 | 14340567.0 | 3.136028 | 19.174732 | 8897.49 | 6796.84 |
| 90 | IND | India | 0.640000 | 407771615.0 | 7247327.0 | 3.610552 | 21.045353 | 2100.75 | 1900.71 |
| 157 | RUS | Russia | 0.816000 | 132888951.0 | 2131571.0 | 3.380088 | 18.798668 | 11497.65 | 10126.72 |
| 150 | PER | Peru | 0.599490 | 74882695.0 | 3020038.0 | 3.430126 | 17.311165 | 7027.61 | 6126.87 |
| 125 | MEX | Mexico | 0.774000 | 74347548.0 | 7295850.0 | 3.019289 | 18.674802 | 9946.03 | 8346.70 |
| 178 | ESP | Spain | 0.887969 | 73717676.0 | 5510624.0 | 3.393922 | 17.660427 | 29564.74 | 27057.16 |
| 175 | ZAF | South Africa | 0.608653 | 63027659.0 | 1357682.0 | 3.364333 | 17.898266 | 6001.40 | 5090.72 |
| 42 | COL | Colombia | 0.581847 | 60543682.0 | 1936134.0 | 3.357923 | 17.745037 | 6424.98 | 5332.77 |
| 199 | GBR | United Kingdom | 0.922000 | 59475032.0 | 7249573.0 | 3.353883 | 18.033340 | 42354.41 | 40284.64 |
Now let’s start by analyzing the spread of covid-19 in all the countries with the highest number of covid-19 cases. I will first have a look at all the countries with the highest number of covid-19 cases:
figure = px.bar(data, y='Total Cases', x='Country',
title="Countries with Highest Covid Cases")
figure.show()
We can see that the USA is comparatively having a very high number of covid-19 cases as compared to Brazil and India in the second and third positions. Now let’s have a look at the total number of deaths among the countries with the highest number of covid-19 cases:
figure = px.bar(data, y='Total Deaths', x='Country',
title="Countries with Highest Deaths")
figure.show()
Just like the total number of covid-19 cases, the USA is leading in the deaths, with Brazil and India in the second and third positions. One thing to notice here is that the death rate in India, Russia, and South Africa is comparatively low according to the total number of cases. Now let’s compare the total number of cases and total deaths in all these countries:
fig = go.Figure()
fig.add_trace(go.Bar(
x=data["Country"],
y=data["Total Cases"],
name='Total Cases',
marker_color='indianred'
))
fig.add_trace(go.Bar(
x=data["Country"],
y=data["Total Deaths"],
name='Total Deaths',
marker_color='lightsalmon'
))
fig.update_layout(barmode='group', xaxis_tickangle=-45)
fig.show()
Now let’s have a look at the percentage of total deaths and total cases among all the countries with the highest number of covid-19 cases:
# Percentage of Total Cases and Deaths
cases = data["Total Cases"].sum()
deceased = data["Total Deaths"].sum()
labels = ["Total Cases", "Total Deaths"]
values = [cases, deceased]
fig = px.pie(data, values=values, names=labels,
title='Percentage of Total Cases and Deaths', hole=0.5)
fig.show()
Below is how you can calculate the death rate of Covid-19 cases:
death_rate = (data["Total Deaths"].sum() / data["Total Cases"].sum()) * 100
print("Death Rate = ", death_rate)
Death Rate = 3.6144212045653767
Another important column in this dataset is the stringency index. It is a composite measure of response indicators, including school closures, workplace closures, and travel bans. It shows how strictly countries are following these measures to control the spread of covid-19:
fig = px.bar(data, x='Country', y='Total Cases',
hover_data=['Population', 'Total Deaths'],
color='Stringency Index', height=400,
title= "Stringency Index during Covid-19")
fig.show()
Here we can see that India is performing well in the stringency index during the outbreak of covid-19.
Now let’s move to analyze the impacts of covid-19 on the economy. Here the GDP per capita is the primary factor for analyzing the economic slowdowns caused due to the outbreak of covid-19. Let’s have a look at the GDP per capita before the outbreak of covid-19 among the countries with the highest number of covid-19 cases:
fig = px.bar(data, x='Country', y='Total Cases',
hover_data=['Population', 'Total Deaths'],
color='GDP Before Covid', height=400,
title="GDP Per Capita Before Covid-19")
fig.show()
Now let’s have a look at the GDP per capita during the rise in the cases of covid-19:
fig = px.bar(data, x='Country', y='Total Cases',
hover_data=['Population', 'Total Deaths'],
color='GDP During Covid', height=400,
title="GDP Per Capita During Covid-19")
fig.show()
Now let’s compare the GDP per capita before covid-19 and during covid-19 to have a look at the impact of covid-19 on the GDP per capita:
fig = go.Figure()
fig.add_trace(go.Bar(
x=data["Country"],
y=data["GDP Before Covid"],
name='GDP Per Capita Before Covid-19',
marker_color='indianred'
))
fig.add_trace(go.Bar(
x=data["Country"],
y=data["GDP During Covid"],
name='GDP Per Capita During Covid-19',
marker_color='lightsalmon'
))
fig.update_layout(barmode='group', xaxis_tickangle=-45)
fig.show()
You can see a drop in GDP per capita in all the countries with the highest number of covid-19 cases.
One other important economic factor is Human Development Index. It is a statistic composite index of life expectancy, education, and per capita indicators. Let’s have a look at how many countries were spending their budget on the human development:
fig = px.bar(data, x='Country', y='Total Cases',
hover_data=['Population', 'Total Deaths'],
color='HDI', height=400,
title="Human Development Index during Covid-19")
fig.show()
So this is how we can analyze the spread of Covid-19 and its impact on the economy.
In this task, we studied the spread of covid-19 among the countries and its impact on the global economy. We saw that the outbreak of covid-19 resulted in the highest number of covid-19 cases and deaths in the united states. One major reason behind this is the stringency index of the United States. It is comparatively low according to the population. We also analyzed how the GDP per capita of every country was affected during the outbreak of covid-19. I hope you liked this article on Covid-19 impacts analysis using Python. Feel free to ask valuable questions in the comments section below.